163 research outputs found

    NAS-VAD: Neural Architecture Search for Voice Activity Detection

    Full text link
    Various neural network-based approaches have been proposed for more robust and accurate voice activity detection (VAD). Manual design of such neural architectures is an error-prone and time-consuming process, which prompted the development of neural architecture search (NAS) that automatically design and optimize network architectures. While NAS has been successfully applied to improve performance in a variety of tasks, it has not yet been exploited in the VAD domain. In this paper, we present the first work that utilizes NAS approaches on the VAD task. To effectively search architectures for the VAD task, we propose a modified macro structure and a new search space with a much broader range of operations that includes attention operations. The results show that the network structures found by the propose NAS framework outperform previous manually designed state-of-the-art VAD models in various noise-added and real-world-recorded datasets. We also show that the architectures searched on a particular dataset achieve improved generalization performance on unseen audio datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.Comment: Submitted to Interspeech 202

    Neural Residual Flow Fields for Efficient Video Representations

    Full text link
    Neural fields have emerged as a powerful paradigm for representing various signals, including videos. However, research on improving the parameter efficiency of neural fields is still in its early stages. Even though neural fields that map coordinates to colors can be used to encode video signals, this scheme does not exploit the spatial and temporal redundancy of video signals. Inspired by standard video compression algorithms, we propose a neural field architecture for representing and compressing videos that deliberately removes data redundancy through the use of motion information across video frames. Maintaining motion information, which is typically smoother and less complex than color signals, requires a far fewer number of parameters. Furthermore, reusing color values through motion information further improves the network parameter efficiency. In addition, we suggest using more than one reference frame for video frame reconstruction and separate networks, one for optical flows and the other for residuals. Experimental results have shown that the proposed method outperforms the baseline methods by a significant margin. The code is available in https://github.com/daniel03c1/eff_video_representationComment: Accepted for ACCV 2022, codes are available at https://github.com/daniel03c1/eff_video_representatio

    Understanding Contrastive Learning Through the Lens of Margins

    Full text link
    Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance

    Hexa: Self-Improving for Knowledge-Grounded Dialogue System

    Full text link
    A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation

    Highly Clumpy Structure of the Thermal Composite Supernova Remnant 3C391 Unveiled by Chandra

    Get PDF
    The nature of the internal thermal X-ray emission seen in ``thermal composite" supernova remnants is still uncertain. Chandra observation of the 3C391 shows a southeast-northwest elongated morphology and unveils a highly clumpy structure of the remnant. Detailed spatially resolved spectral analysis for the small-scale features reveals normal metal abundance and uniform temperature for the interior gas. The properties of the hot gas comparatively favor the cloudlet evaporation model as a main mechanism for the ``thermal composite" X-ray appearance, though radiative rim and thermal conduction may also be effective. A faint protrusion is found in Si and S lines out of the southwest radio border.Comment: 7 pages, 4 embedded figures, in COSPAR 2004 session E1.4, "Young Neutron Stars and Supernova Remnants", Advances in Space Research, in pres

    Prospects for Pentaquark Production at Meson Factories

    Get PDF
    Following Rosner [hep-ph/0312269], we consider B-decay production channels for the exotic I=0 and I=3/2I=3/2 pentaquarks that have been recently reported. We also discuss new search channels for isovector pentaquarks, such as the Θ++(sˉduuu)\Theta^{*++} (\bar s duuu), that are generically present in chiral soliton models but were not observed in recent experiments. Futhermore, we argue that weak decays of charmed baryons, such as the Λc+\Lambda_c^+ and Ξc0\Xi_c^0, provide another clean way of detecting exotic baryons made of light quarks only. We also discuss discovery channels for charmed pentaquarks, such as the isosinglet Θc0(cˉudud)\Theta_c^0 (\bar c udud), in weak decays of bottom mesons and baryons. Finally, we discuss prospects for inclusive production of pentaquarks in e+ee^+ e^- collisions, with associated production of particles carrying the opposite baryon number.Comment: 15 pages, LaTeX; v2,v3: minor corrections, references added; v4: minor modifications, the version published in Physics Letters

    Interstellar Silicate Dust in the z=0.89 Absorber Towards PKS 1830-211: Crystalline Silicates at High Redshift?

    Full text link
    We present evidence of a >10-sigma detection of the 10 micron silicate dust absorption feature in the spectrum of the gravitationally lensed quasar PKS 1830-211, produced by a foreground absorption system at redshift 0.886. We have examined more than 100 optical depth templates, derived from both observations of Galactic and extragalactic sources and laboratory measurements, in order to constrain the chemical structure of the silicate dust. We find that the best fit to the observed absorption profile is produced by laboratory crystalline olivine, with a corresponding peak optical depth of tau_10=0.27+/-0.05. The fit is slightly improved upon by including small contributions from additional materials such as silica, enstatite, or serpentine, which suggests that the dust composition may consist of a blend of crystalline silicates. Combining templates for amorphous and crystalline silicates, we find that the fraction of crystalline silicates needs to be at least 95%. Given the rarity of extragalactic sources with such a high degree of silicate crystallinity, we also explore the possibility that the observed spectral features are produced by amorphous silicates in combination with other molecular or atomic transitions, or by foreground source contamination. While we cannot rule out these latter possibilities, they lead to much poorer profile fits than for the crystalline olivine templates. If the presence of crystalline interstellar silicates in this distant galaxy is real, it would be highly unusual, given that the Milky Way interstellar matter contains essentially only amorphous silicates. It is possible that the z=0.886 absorber towards PKS 1830-211, well known for its high molecular content, has a unique star-forming environment that enables crystalline silicates to form and prevail.Comment: 67 pages, 21 figures, accepted for publication in the Astrophysical Journa
    corecore